{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Handling Categorical Data with Bokeh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bokeh.io import show, output_notebook\n",
    "from bokeh.models import CategoricalColorMapper, ColumnDataSource, FactorRange\n",
    "from bokeh.plotting import figure\n",
    "\n",
    "output_notebook()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Basic Bar Plot\n",
    "\n",
    "To create a basic Bar Plot, typically all that is needed is to call `vbar` with `x` and `top`, and values, or `hbar` with `y` and `right` and values. The default `width` or `height` may also be supplied if something different than the default value of 1 is desired. \n",
    "\n",
    "The example below plots vertical bars representing counts for different types of fruit on a categorical range:\n",
    "\n",
    "    x_range = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']\n",
    "counts = [5, 3, 4, 2, 4, 6]\n",
    "\n",
    "p = figure(x_range=fruits, plot_height=250, toolbar_location=None, title=\"Fruit Counts\")\n",
    "p.vbar(x=fruits, top=counts, width=0.9)\n",
    "p.xgrid.grid_line_color = None\n",
    "p.y_range.start = 0\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sorting Bars\n",
    "\n",
    "Bokeh displays the bars in the order the factors are given for the range. So, \"sorting\" bars in a bar plot is identical to sorting the factors for the range. \n",
    "\n",
    "In the example below the fruit factors are sorted in increasing order according to their corresponing counts, causing the bars to be sorted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sorted_fruits = sorted(fruits, key=lambda x: counts[fruits.index(x)])\n",
    "\n",
    "p = figure(x_range=sorted_fruits, plot_height=250, toolbar_location=None, title=\"Fruit Counts\")\n",
    "p.vbar(x=fruits, top=counts, width=0.9)\n",
    "p.xgrid.grid_line_color = None\n",
    "p.y_range.start = 0\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bar Plot with Explicit Colors\n",
    "\n",
    "To set the color of each bar, you can pass explicit color values to the `color` option (which is shorthand for setting both the `fill_color` and `line_color`). \n",
    "\n",
    "In the example below add shading to the previous plot, but now all the data (including the explicit colors) is put inside a `ColumnDataSource` which is passed to `vbar` as the `source` argument. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bokeh.palettes import Spectral6\n",
    "\n",
    "fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']\n",
    "counts = [5, 3, 4, 2, 4, 6]\n",
    "\n",
    "source = ColumnDataSource(data=dict(fruits=fruits, counts=counts, color=Spectral6))\n",
    "\n",
    "p = figure(x_range=fruits, plot_height=250, toolbar_location=None, title=\"Fruit Counts\")\n",
    "p.vbar(x='fruits', top='counts', width=0.9, color='color', legend=\"fruits\", source=source)\n",
    "\n",
    "p.xgrid.grid_line_color = None\n",
    "p.y_range.start = 0\n",
    "p.y_range.end = 9\n",
    "p.legend.orientation = \"horizontal\"\n",
    "p.legend.location = \"top_center\"\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bar Plot with Color Mapper\n",
    "\n",
    "Another way to shade bars different colors is to provide a colormapper. The `factor_cmap` transform can be applied to map a categorical value into a colot. Other transorm include `linear_cmap` and `log_cmap` which can be used to map continuous numercical values to colors. \n",
    "\n",
    "The example below reproduces previous example using a `factor_cmap` to convert fruit types into colors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bokeh.transform import factor_cmap\n",
    "\n",
    "fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']\n",
    "counts = [5, 3, 4, 2, 4, 6]\n",
    "\n",
    "source = ColumnDataSource(data=dict(fruits=fruits, counts=counts))\n",
    "\n",
    "p = figure(x_range=fruits, plot_height=250, toolbar_location=None, title=\"Fruit Counts\")\n",
    "p.vbar(x='fruits', top='counts', width=0.9, source=source, legend=\"fruits\",\n",
    "       line_color='white', fill_color=factor_cmap('fruits', palette=\"Spectral6\", factors=fruits))\n",
    "\n",
    "p.xgrid.grid_line_color = None\n",
    "p.y_range.start = 0\n",
    "p.y_range.end = 9\n",
    "p.legend.orientation = \"horizontal\"\n",
    "p.legend.location = \"top_center\"\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Grouped Bars (Hierarchical Categories)\n",
    "\n",
    "Often categorical data is arranged into hierarchies, for instance we might have fruit counts, per year. To represent this kind of hierarchy, our range becomes a list of tuples:\n",
    "\n",
    "    x_range = [ (\"Apples\", \"2015\"), (\"Apples\", \"2016\"), (\"Apples\", \"2017\"), ... ]\n",
    "    \n",
    "The coordinates for the bars should be these same tuple values. When we create a hierarchcal range in this way, Bokeh will automatically create a visually grouped axis. \n",
    "\n",
    "The plot below displays fruit counts per year."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']\n",
    "years = ['2015', '2016', '2017']\n",
    "\n",
    "data = {'fruits' : fruits,\n",
    "        '2015'   : [2, 1, 4, 3, 2, 4],\n",
    "        '2016'   : [5, 3, 3, 2, 4, 6],\n",
    "        '2017'   : [3, 2, 4, 4, 5, 3]}\n",
    "\n",
    "# this creates [ (\"Apples\", \"2015\"), (\"Apples\", \"2016\"), (\"Apples\", \"2017\"), (\"Pears\", \"2015), ... ]\n",
    "x = [ (fruit, year) for fruit in fruits for year in years ]\n",
    "counts = sum(zip(data['2015'], data['2016'], data['2017']), ()) # like an hstack\n",
    "\n",
    "source = ColumnDataSource(data=dict(x=x, counts=counts))\n",
    "\n",
    "p = figure(x_range=FactorRange(*x), plot_height=250, \n",
    "           toolbar_location=None, title=\"Fruit Counts by Year\")\n",
    "p.vbar(x='x', top='counts', width=0.9, source=source)\n",
    "\n",
    "p.x_range.range_padding = 0.1\n",
    "p.xgrid.grid_line_color = None\n",
    "p.y_range.start = 0\n",
    "p.xaxis.major_label_orientation = 1\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Grouped Bars with Color Mapper\n",
    "\n",
    "We can combine a color mapper with hierachical ranges, and in fact we can choose to apply a color mapping based on only \"part\" of a categorical factor. \n",
    "\n",
    "In the example below, the arguments `start=1, end=2` are passed to `factor_cmap`. This means that for each factor value (which is a tuple), the value `factor[1:2]` is what shoud be used for colormapping. In this specific case, that translates to shading each bar according to the \"year\" portion."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bokeh.transform import factor_cmap\n",
    "\n",
    "fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']\n",
    "years = ['2015', '2016', '2017']\n",
    "\n",
    "data = {'fruits' : fruits,\n",
    "        '2015'   : [2, 1, 4, 3, 2, 4],\n",
    "        '2016'   : [5, 3, 3, 2, 4, 6],\n",
    "        '2017'   : [3, 2, 4, 4, 5, 3]}\n",
    "\n",
    "# this creates [ (\"Apples\", \"2015\"), (\"Apples\", \"2016\"), (\"Apples\", \"2017\"), (\"Pears\", \"2015), ... ]\n",
    "x = [ (fruit, year) for fruit in fruits for year in years ]\n",
    "counts = sum(zip(data['2015'], data['2016'], data['2017']), ()) # like an hstack\n",
    "\n",
    "source = ColumnDataSource(data=dict(x=x, counts=counts))\n",
    "\n",
    "p = figure(x_range=FactorRange(*x), plot_height=250, toolbar_location=None, title=\"Fruit Counts by Year\")\n",
    "p.vbar(x='x', top='counts', width=0.9, source=source, line_color=\"white\",\n",
    "       fill_color=factor_cmap('x', palette=[\"#c9d9d3\", \"#718dbf\", \"#e84d60\"], factors=years, start=1, end=2))\n",
    "\n",
    "p.x_range.range_padding = 0.1\n",
    "p.xgrid.grid_line_color = None\n",
    "p.y_range.start = 0\n",
    "p.xaxis.major_label_orientation = 1\n",
    "\n",
    "show(p)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Grouped Bars with Position Dodge\n",
    "\n",
    "Some times we may wish to have \"grouped\" bars without a visually grouped axis. For instance, we may wish to indicate groups by colormapping or other means. This can be accomplished in Bokeh by providing \"flat\" (i.e. non-tuple) factors, and using the `dodge` transform to shift the bars by an arbitrary amount. \n",
    "\n",
    "The example below also shows fruit counts per year, grouping the bars with `dodge` on the flat categorical range from the original example above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bokeh.core.properties import value\n",
    "from bokeh.transform import dodge, factor_cmap\n",
    "\n",
    "fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']\n",
    "years = ['2015', '2016', '2017']\n",
    "\n",
    "data = {'fruits' : fruits,\n",
    "        '2015'   : [2, 1, 4, 3, 2, 4],\n",
    "        '2016'   : [5, 3, 3, 2, 4, 6],\n",
    "        '2017'   : [3, 2, 4, 4, 5, 3]}\n",
    "\n",
    "source = ColumnDataSource(data=data)\n",
    "\n",
    "p = figure(x_range=fruits, plot_height=250, toolbar_location=None, title=\"Fruit Counts by Year\")\n",
    "p.vbar(x=dodge('fruits', -0.25, range=p.x_range), top='2015', width=0.2, source=source, \n",
    "       color=\"#c9d9d3\", legend=value(\"2015\"))\n",
    "p.vbar(x=dodge('fruits',  0.0,  range=p.x_range), top='2016', width=0.2, source=source, \n",
    "       color=\"#718dbf\", legend=value(\"2016\"))\n",
    "p.vbar(x=dodge('fruits',  0.25, range=p.x_range), top='2017', width=0.2, source=source, \n",
    "       color=\"#e84d60\", legend=value(\"2017\"))\n",
    "\n",
    "p.x_range.range_padding = 0.1\n",
    "p.xgrid.grid_line_color = None\n",
    "p.y_range.start = 0\n",
    "p.y_range.end = 10\n",
    "p.legend.location = \"top_left\"\n",
    "p.legend.orientation = \"horizontal\"\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Vertically Stacked Bars\n",
    "\n",
    "We may also wish to stack bars, instead of grouping them. Bokeh provides `vbar_stack` and `hbar_stack` to help with this. To use these functions we pass a list of \"stackers\" which is a sequence of column names for columns in our data source. Each column represents one \"layer\" across all of our stacked bars, and each column is added to the previous columns to position the next layer. \n",
    "\n",
    "The example below shows out fruit counts per year, this time stacked by year instead of grouped."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bokeh.core.properties import value\n",
    "from bokeh.models import ColumnDataSource\n",
    "from bokeh.plotting import figure\n",
    "\n",
    "fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']\n",
    "years = [\"2015\", \"2016\", \"2017\"]\n",
    "colors = [\"#c9d9d3\", \"#718dbf\", \"#e84d60\"]\n",
    "\n",
    "data = {'fruits' : fruits,\n",
    "        '2015'   : [2, 1, 4, 3, 2, 4],\n",
    "        '2016'   : [5, 3, 4, 2, 4, 6],\n",
    "        '2017'   : [3, 2, 4, 4, 5, 3]}\n",
    "\n",
    "source = ColumnDataSource(data=data)\n",
    "\n",
    "p = figure(x_range=fruits, plot_height=250,\n",
    "    toolbar_location=None, title=\"Fruit Counts by Year\")\n",
    "\n",
    "p.vbar_stack(years, x='fruits', width=0.9, color=colors, source=source, legend=[value(x) for x in years])\n",
    "\n",
    "p.x_range.range_padding = 0.1\n",
    "p.xgrid.grid_line_color = None\n",
    "p.y_range.start = 0\n",
    "p.legend.location = \"top_left\"\n",
    "p.legend.orientation = \"horizontal\"\n",
    "p.axis.minor_tick_line_color = None\n",
    "p.outline_line_color = None\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Horizontally Stacked Bars\n",
    "\n",
    "The example below uses `hbar_stack` to display exports for each fruit, stacked by year. It also demonstrates that negative stack values are acceptable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bokeh.models import ColumnDataSource\n",
    "from bokeh.palettes import GnBu3, OrRd3\n",
    "from bokeh.plotting import figure\n",
    "\n",
    "fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']\n",
    "years = [\"2015\", \"2016\", \"2017\"]\n",
    "\n",
    "exports = {'fruits' : fruits,\n",
    "           '2015'   : [2, 1, 4, 3, 2, 4],\n",
    "           '2016'   : [5, 3, 4, 2, 4, 6],\n",
    "           '2017'   : [3, 2, 4, 4, 5, 3]}\n",
    "imports = {'fruits' : fruits,\n",
    "           '2015'   : [-1, 0, -1, -3, -2, -1],\n",
    "           '2016'   : [-2, -1, -3, -1, -2, -2],\n",
    "           '2017'   : [-1, -2, -1, 0, -2, -2]}\n",
    "\n",
    "p = figure(y_range=fruits, plot_height=250, x_range=(-16, 16), title=\"Fruit import/export, by year\",\n",
    "           toolbar_location=None)\n",
    "\n",
    "p.hbar_stack(years, y='fruits', height=0.9, color=GnBu3, source=ColumnDataSource(exports),\n",
    "             legend=[\"%s exports\" % x for x in years])\n",
    "\n",
    "p.hbar_stack(years, y='fruits', height=0.9, color=OrRd3, source=ColumnDataSource(imports),\n",
    "             legend=[\"%s imports\" % x for x in years])\n",
    "\n",
    "p.y_range.range_padding = 0.1\n",
    "p.ygrid.grid_line_color = None\n",
    "p.legend.location = \"top_left\"\n",
    "p.axis.minor_tick_line_color = None\n",
    "p.outline_line_color = None\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Grouped Bars with Line (Mixed Category Levels)\n",
    "\n",
    "Whenever we use hierarchical categories, it is possible to use coordinates that refer to only the first portions of a factor. In this case, coordinates are centered inside the group appropriately. \n",
    "\n",
    "The example below uses bars to show sales values for every month, grouped by quarter. Each bar has coordinates such as `(\"Q1\", \"jan\")`, etc. Additionally a line displays the quarterly average trends, by using coordinates such as `\"Q1\"`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "factors = [\n",
    "    (\"Q1\", \"jan\"), (\"Q1\", \"feb\"), (\"Q1\", \"mar\"),\n",
    "    (\"Q2\", \"apr\"), (\"Q2\", \"may\"), (\"Q2\", \"jun\"),\n",
    "    (\"Q3\", \"jul\"), (\"Q3\", \"aug\"), (\"Q3\", \"sep\"),\n",
    "    (\"Q4\", \"oct\"), (\"Q4\", \"nov\"), (\"Q4\", \"dec\"),\n",
    "\n",
    "]\n",
    "\n",
    "p = figure(x_range=FactorRange(*factors), plot_height=250,\n",
    "           toolbar_location=None, tools=\"\")\n",
    "\n",
    "x = [ 10, 12, 16, 9, 10, 8, 12, 13, 14, 14, 12, 16 ]\n",
    "p.vbar(x=factors, top=x, width=0.9, alpha=0.5)\n",
    "\n",
    "p.line(x=[\"Q1\", \"Q2\", \"Q3\", \"Q4\"], y=[12, 9, 13, 14], color=\"red\", line_width=2)\n",
    "\n",
    "p.y_range.start = 0\n",
    "p.x_range.range_padding = 0.1\n",
    "p.xaxis.major_label_orientation = 1\n",
    "p.xgrid.grid_line_color = None\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Stacked and Grouped Bars\n",
    "\n",
    "The above techiques for stacking and grouping may also be used together to create a stacked, grouped bar plot. \n",
    "\n",
    "Continuing the example above, we might stack each individual bar by region."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p = figure(x_range=FactorRange(*factors), plot_height=250,\n",
    "           toolbar_location=None, tools=\"\")\n",
    "\n",
    "regions = ['east', 'west']\n",
    "\n",
    "source = ColumnDataSource(data=dict(\n",
    "    x=factors,\n",
    "    east=[ 5, 5, 6, 5, 5, 4, 5, 6, 7, 8, 6, 9 ],\n",
    "    west=[ 5, 7, 9, 4, 5, 4, 7, 7, 7, 6, 6, 7 ],\n",
    "))\n",
    "\n",
    "p.vbar_stack(regions, x='x', width=0.9, alpha=0.5, color=[\"blue\", \"red\"], source=source,\n",
    "             legend=[value(x) for x in regions])\n",
    "\n",
    "p.y_range.start = 0\n",
    "p.y_range.end = 18\n",
    "p.x_range.range_padding = 0.1\n",
    "p.xaxis.major_label_orientation = 1\n",
    "p.xgrid.grid_line_color = None\n",
    "p.legend.location = \"top_center\"\n",
    "p.legend.orientation = \"horizontal\"\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Interval Plot\n",
    "\n",
    "So far we have used bar glyphs to create bar charts starting from a common baseline, but bars are also useful for displaying arbitrary intervals. \n",
    "\n",
    "The example below shows the low/high time spread for sprint medalists in each year of the olympics. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bokeh.sampledata.sprint import sprint\n",
    "\n",
    "sprint.Year = sprint.Year.astype(str)\n",
    "\n",
    "group = sprint.groupby('Year')\n",
    "\n",
    "source = ColumnDataSource(group)\n",
    "\n",
    "p = figure(y_range=group, x_range=(9.5,12.7), plot_width=400, plot_height=550, toolbar_location=None, \n",
    "           title=\"Time Spreads for Sprint Medalists (by Year)\")\n",
    "p.ygrid.grid_line_color = None\n",
    "p.xaxis.axis_label = \"Time (seconds)\"\n",
    "p.outline_line_color = None\n",
    "\n",
    "p.hbar(y=\"Year\", left='Time_min', right='Time_max', height=0.4, source=source)\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pandas to Simple Bars\n",
    "\n",
    "Although Pandas is not required to use Bokeh, using Pandas can make many things simpler. For instance, Pandas `GroupBy` objects can be passed as the `source` argument to a glyph (or used to initialize a `ColumnDataSource`. When this is done, summary statistics for each group are automatically availanle in the data source.\n",
    "\n",
    "In the example below we pass `autompg.groupby(('cyl'))` as our source. Since the \"autompg\" DataFrame has and `mpg` column, our grouped data source automatically has an `mpg_mean` column we can use to drive glyphs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bokeh.sampledata.autompg import autompg_clean as df\n",
    "\n",
    "df.cyl = df.cyl.astype(str)\n",
    "df.yr = df.yr.astype(str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bokeh.palettes import Spectral5\n",
    "from bokeh.transform import factor_cmap\n",
    "\n",
    "group = df.groupby(('cyl'))\n",
    "\n",
    "source = ColumnDataSource(group)\n",
    "cyl_cmap = factor_cmap('cyl', palette=Spectral5, factors=sorted(df.cyl.unique()))\n",
    "\n",
    "p = figure(plot_height=350, x_range=group, toolbar_location=None)\n",
    "p.vbar(x='cyl', top='mpg_mean', width=1, line_color=\"white\", \n",
    "       fill_color=cyl_cmap, source=source)\n",
    "\n",
    "p.xgrid.grid_line_color = None\n",
    "p.y_range.start = 0\n",
    "p.xaxis.axis_label = \"some stuff\"\n",
    "p.xaxis.major_label_orientation = 1.2\n",
    "p.outline_line_color = None \n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pandas to Grouped Bars\n",
    "\n",
    "We can also pass Pandas `GroupBy` objects as plot ranges. When this happens, Bokeh automatically creates a hierarchical nested axis.\n",
    "\n",
    "The example below creates a doubly nested range."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bokeh.models import HoverTool\n",
    "from bokeh.palettes import Spectral5\n",
    "from bokeh.transform import factor_cmap\n",
    "\n",
    "group = df.groupby(by=['cyl', 'mfr'])\n",
    "\n",
    "source = ColumnDataSource(group)\n",
    "index_cmap = factor_cmap('cyl_mfr', palette=Spectral5, factors=sorted(df.cyl.unique()), end=1)\n",
    "\n",
    "p = figure(plot_width=900, plot_height=400, x_range=group, toolbar_location=None, \n",
    "           title=\"Mean MPG by # Cylinders and Manufacturer\")\n",
    "p.vbar(x='cyl_mfr', top='mpg_mean', width=1, line_color=\"white\", \n",
    "       fill_color=index_cmap, source=source)\n",
    "\n",
    "p.x_range.range_padding = 0.05\n",
    "p.xgrid.grid_line_color = None\n",
    "p.y_range.start = 0\n",
    "p.xaxis.axis_label = \"Manufacturer grouped by # Cylinders\"\n",
    "p.xaxis.major_label_orientation = 1.2\n",
    "p.outline_line_color = None \n",
    "\n",
    "p.add_tools(HoverTool(tooltips=[(\"MPG\", \"@mpg_mean\"), (\"Cyl, Mfr\", \"@cyl_mfr\")]))\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Categorical Scatter with Jitter\n",
    "\n",
    "So far we have mostly plotted bars on categorical ranges, but other glyphs work as well. For instance we could plot a scatter plot of circles against a categorical range. Often times, such plots are improved by jittering the data along the categorical range. Bokeh provides a `jitter` transform that can accomplish that. \n",
    "\n",
    "The example below shows an individual GitHub commit history grouped by day of the week, and jittered to improve readability. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from bokeh.transform import jitter\n",
    "from bokeh.sampledata.commits import data\n",
    "\n",
    "DAYS = ['Sun', 'Sat', 'Fri', 'Thu', 'Wed', 'Tue', 'Mon']\n",
    "\n",
    "\n",
    "source = ColumnDataSource(data)\n",
    "\n",
    "p = figure(plot_width=800, plot_height=300, y_range=DAYS, x_axis_type='datetime', \n",
    "           title=\"Commits by Time of Day (US/Central) 2012—2016\")\n",
    "p.circle(x='time', y=jitter('day', width=0.6, range=p.y_range),  source=source, alpha=0.3)\n",
    "\n",
    "p.xaxis[0].formatter.days = ['%Hh']\n",
    "p.x_range.range_padding = 0\n",
    "p.ygrid.grid_line_color = None\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively we might show the same data using bars, only giving a count per day."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "group = data.groupby('day')\n",
    "source = ColumnDataSource(group)\n",
    "\n",
    "p = figure(plot_width=800, plot_height=300, y_range=DAYS, x_range=(0, 1010), \n",
    "           title=\"Commits by Day of the Week, 2012—2016\", toolbar_location=None)\n",
    "p.hbar(y='day', right='time_count', height=0.9, source=source)\n",
    "\n",
    "p.ygrid.grid_line_color = None\n",
    "p.outline_line_color = None\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Categorical Heatmaps\n",
    "\n",
    "Another kind of common categorical plot is the Categorical Heatmap, which has categorical ranges on both axes. Typically colormapped or shaded rectangles are diplayed for each *(x,y)* categorical comnination. \n",
    "\n",
    "The examples below demonstrates a catgorical heatmap using unemployment data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "from bokeh.io import show\n",
    "from bokeh.models import BasicTicker, ColorBar, ColumnDataSource, LinearColorMapper, PrintfTickFormatter\n",
    "from bokeh.plotting import figure\n",
    "from bokeh.sampledata.unemployment1948 import data\n",
    "from bokeh.transform import transform\n",
    "\n",
    "data.Year = data.Year.astype(str)\n",
    "data = data.set_index('Year')\n",
    "data.drop('Annual', axis=1, inplace=True)\n",
    "data.columns.name = 'Month'\n",
    "\n",
    "# reshape to 1D array or rates with a month and year for each row.\n",
    "df = pd.DataFrame(data.stack(), columns=['rate']).reset_index()\n",
    "\n",
    "source = ColumnDataSource(df)\n",
    "\n",
    "# this is the colormap from the original NYTimes plot\n",
    "colors = [\"#75968f\", \"#a5bab7\", \"#c9d9d3\", \"#e2e2e2\", \"#dfccce\", \"#ddb7b1\", \"#cc7878\", \"#933b41\", \"#550b1d\"]\n",
    "mapper = LinearColorMapper(palette=colors, low=df.rate.min(), high=df.rate.max())\n",
    "\n",
    "p = figure(title=\"US Unemployment 1948—2016\", toolbar_location=None, tools=\"\",\n",
    "           x_range=list(data.index), y_range=list(reversed(data.columns)),\n",
    "           x_axis_location=\"above\", plot_width=900, plot_height=400)\n",
    "\n",
    "p.axis.axis_line_color = None\n",
    "p.axis.major_tick_line_color = None\n",
    "p.axis.major_label_text_font_size = \"5pt\"\n",
    "p.axis.major_label_standoff = 0\n",
    "p.xaxis.major_label_orientation = 1.0\n",
    "\n",
    "p.rect(x=\"Year\", y=\"Month\", width=1, height=1, source=source,\n",
    "       line_color=None, fill_color=transform('rate', mapper))\n",
    "\n",
    "color_bar = ColorBar(color_mapper=mapper, location=(0, 0),\n",
    "                     ticker=BasicTicker(desired_num_ticks=len(colors)),\n",
    "                     formatter=PrintfTickFormatter(format=\"%d%%\"))\n",
    "\n",
    "p.add_layout(color_bar, 'right')\n",
    "\n",
    "show(p)    \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to heatmaps that use colormapping to shade each rectangle, a similar technique can be used to create various kinds of illustrations, for instance the example below uses Bokeh to make an interactive periodic table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bokeh.io import output_file, show\n",
    "from bokeh.models import ColumnDataSource, HoverTool\n",
    "from bokeh.plotting import figure\n",
    "from bokeh.sampledata.periodic_table import elements\n",
    "from bokeh.transform import dodge, factor_cmap\n",
    "\n",
    "periods = [\"I\", \"II\", \"III\", \"IV\", \"V\", \"VI\", \"VII\"]\n",
    "groups = [str(x) for x in range(1, 19)]\n",
    "\n",
    "df = elements.copy()\n",
    "df[\"atomic mass\"] = df[\"atomic mass\"].astype(str)\n",
    "df[\"group\"] = df[\"group\"].astype(str)\n",
    "df[\"period\"] = [periods[x-1] for x in df.period]\n",
    "df = df[df.group != \"-\"]\n",
    "df = df[df.symbol != \"Lr\"]\n",
    "df = df[df.symbol != \"Lu\"]\n",
    "\n",
    "cmap = {\n",
    "    \"alkali metal\"         : \"#a6cee3\",\n",
    "    \"alkaline earth metal\" : \"#1f78b4\",\n",
    "    \"metal\"                : \"#d93b43\",\n",
    "    \"halogen\"              : \"#999d9a\",\n",
    "    \"metalloid\"            : \"#e08d79\",\n",
    "    \"noble gas\"            : \"#eaeaea\",\n",
    "    \"nonmetal\"             : \"#f1d4Af\",\n",
    "    \"transition metal\"     : \"#599d7A\",\n",
    "}\n",
    "\n",
    "source = ColumnDataSource(df)\n",
    "\n",
    "p = figure(title=\"Periodic Table (omitting LA and AC Series)\", plot_width=900, plot_height=500, \n",
    "           tools=\"\", toolbar_location=None,\n",
    "           x_range=groups, y_range=list(reversed(periods)))\n",
    "\n",
    "p.rect(\"group\", \"period\", 0.95, 0.95, source=source, fill_alpha=0.6, legend=\"metal\",\n",
    "       color=factor_cmap('metal', palette=list(cmap.values()), factors=list(cmap.keys())))\n",
    "\n",
    "text_props = {\"source\": source, \"text_align\": \"left\", \"text_baseline\": \"middle\"}\n",
    "\n",
    "x = dodge(\"group\", -0.4, range=p.x_range)\n",
    "\n",
    "r = p.text(x=x, y=\"period\", text=\"symbol\", **text_props)\n",
    "r.glyph.text_font_style=\"bold\"\n",
    "\n",
    "r = p.text(x=x, y=dodge(\"period\", 0.3, range=p.y_range), text=\"atomic number\", **text_props)\n",
    "r.glyph.text_font_size=\"8pt\"\n",
    "\n",
    "r = p.text(x=x, y=dodge(\"period\", -0.35, range=p.y_range), text=\"name\", **text_props)\n",
    "r.glyph.text_font_size=\"5pt\"\n",
    "\n",
    "r = p.text(x=x, y=dodge(\"period\", -0.2, range=p.y_range), text=\"atomic mass\", **text_props)\n",
    "r.glyph.text_font_size=\"5pt\"\n",
    "\n",
    "p.text(x=[\"3\", \"3\"], y=[\"VI\", \"VII\"], text=[\"LA\", \"AC\"], text_align=\"center\", text_baseline=\"middle\")\n",
    "\n",
    "p.add_tools(HoverTool(tooltips = [\n",
    "    (\"Name\", \"@name\"),\n",
    "    (\"Atomic number\", \"@{atomic number}\"),\n",
    "    (\"Atomic mass\", \"@{atomic mass}\"),\n",
    "    (\"Type\", \"@metal\"),\n",
    "    (\"CPK color\", \"$color[hex, swatch]:CPK\"),\n",
    "    (\"Electronic configuration\", \"@{electronic configuration}\"),\n",
    "]))\n",
    "\n",
    "p.outline_line_color = None\n",
    "p.grid.grid_line_color = None\n",
    "p.axis.axis_line_color = None\n",
    "p.axis.major_tick_line_color = None\n",
    "p.axis.major_label_standoff = 0\n",
    "p.legend.orientation = \"horizontal\"\n",
    "p.legend.location =\"top_center\"\n",
    "\n",
    "show(p)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ridge Plot (Categorical Offsets)\n",
    "\n",
    "We have seen above how the `dodge` transform can be used to shift an entire column of categorical values. But it is possible to offset individual coordinates but putting the offset at the end of a tuple with a factor. For instance, if we have catefories `\"foo\"` and `\"bar\"` then \n",
    "\n",
    "    (\"foo\", 0.1), (\"foo\", 0.2), (\"bar\", -0.3)\n",
    "\n",
    "Are all examples of individual coordinates shifted on a per-coordinate basis. \n",
    "\n",
    "This technique can be used to create \"Ridge Plots\" which show lines (or filled areas) for different categories. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import colorcet as cc\n",
    "from numpy import linspace\n",
    "from scipy.stats.kde import gaussian_kde\n",
    "\n",
    "from bokeh.sampledata.perceptions import probly\n",
    "from bokeh.models import FixedTicker, PrintfTickFormatter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "probly.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def ridge(category, data, scale=20):\n",
    "    ''' For a given category and timeseries for that category, return categorical\n",
    "    coordiantes with offsets scaled by the timeseries. \n",
    "    \n",
    "    '''\n",
    "    return list(zip([category]*len(data), scale*data))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cats = list(reversed(probly.keys()))\n",
    "\n",
    "palette = [cc.rainbow[i*15] for i in range(17)]\n",
    "\n",
    "x = linspace(-20,110, 500)\n",
    "\n",
    "source = ColumnDataSource(data=dict(x=x))\n",
    "\n",
    "p = figure(y_range=cats, plot_width=900, x_range=(-5, 105), toolbar_location=None)\n",
    "\n",
    "for i, cat in enumerate(reversed(cats)):\n",
    "    pdf = gaussian_kde(probly[cat])\n",
    "    y = ridge(cat, pdf(x))\n",
    "    source.add(y, cat)\n",
    "    p.patch('x', cat, color=palette[i], alpha=0.6, line_color=\"black\", source=source)\n",
    "    \n",
    "p.outline_line_color = None\n",
    "p.background_fill_color = \"#efefef\"\n",
    "\n",
    "p.xaxis.ticker = FixedTicker(ticks=list(range(0, 101, 10)))\n",
    "p.xaxis.formatter = PrintfTickFormatter(format=\"%d%%\")\n",
    "\n",
    "p.ygrid.grid_line_color = None\n",
    "p.xgrid.grid_line_color = \"#dddddd\"\n",
    "p.xgrid.ticker = p.xaxis[0].ticker\n",
    "\n",
    "p.axis.minor_tick_line_color = None\n",
    "p.axis.major_tick_line_color = None\n",
    "p.axis.axis_line_color = None\n",
    "\n",
    "p.y_range.range_padding = 0.12\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
